Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 44212225 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 3.3 GiB |
| Average record size in memory | 80.0 B |
Variable types
| Categorical | 2 |
|---|---|
| Numeric | 8 |
PRODUCTID has a high cardinality: 1310 distinct values | High cardinality |
CURRENTPRICE is highly correlated with LISTPRICE | High correlation |
LISTPRICE is highly correlated with CURRENTPRICE | High correlation |
CURRENTPRICE is highly correlated with LISTPRICE | High correlation |
LISTPRICE is highly correlated with CURRENTPRICE | High correlation |
CURRENTPRICE is highly correlated with LISTPRICE | High correlation |
LISTPRICE is highly correlated with CURRENTPRICE | High correlation |
DATE is highly correlated with LOCATIONID | High correlation |
UNITSSOLD is highly correlated with LISTPRICE and 1 other fields | High correlation |
LISTPRICE is highly correlated with UNITSSOLD and 1 other fields | High correlation |
CURRENTPRICE is highly correlated with UNITSSOLD and 1 other fields | High correlation |
LOCATIONID is highly correlated with DATE | High correlation |
AVAILABLEQUANTITY is highly skewed (γ1 = 219.0416833) | Skewed |
SALEPRICE is highly skewed (γ1 = 256.1790504) | Skewed |
UNITSRESTOCKED is highly skewed (γ1 = 694.084977) | Skewed |
AVAILABLEQUANTITY has 41347982 (93.5%) zeros | Zeros |
SALEPRICE has 44050686 (99.6%) zeros | Zeros |
UNITSRESTOCKED has 44097689 (99.7%) zeros | Zeros |
UNITSSOLD has 24122772 (54.6%) zeros | Zeros |
Reproduction
| Analysis started | 2021-06-20 20:57:05.703968 |
|---|---|
| Analysis finished | 2021-06-21 01:19:02.337283 |
| Duration | 4 hours, 21 minutes and 56.63 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 37 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 337.3 MiB |
| 2021-03-09 | 2813952 |
|---|---|
| 2021-03-13 | 2793472 |
| 2020-11-07 | 2302548 |
| 2021-02-23 | 2252800 |
| 2021-01-18 | 2244608 |
| Other values (32) |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 442122250 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2020-11-07 |
|---|---|
| 2nd row | 2020-11-07 |
| 3rd row | 2020-11-07 |
| 4th row | 2020-11-07 |
| 5th row | 2020-11-07 |
Common Values
| Value | Count | Frequency (%) |
| 2021-03-09 | 2813952 | 6.4% |
| 2021-03-13 | 2793472 | 6.3% |
| 2020-11-07 | 2302548 | 5.2% |
| 2021-02-23 | 2252800 | 5.1% |
| 2021-01-18 | 2244608 | 5.1% |
| 2021-03-31 | 2114796 | 4.8% |
| 2021-02-28 | 2093056 | 4.7% |
| 2021-02-07 | 2079976 | 4.7% |
| 2020-12-05 | 1990891 | 4.5% |
| 2020-12-08 | 1947341 | 4.4% |
| Other values (27) | 21578785 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 2021-03-09 | 2813952 | 6.4% |
| 2021-03-13 | 2793472 | 6.3% |
| 2020-11-07 | 2302548 | 5.2% |
| 2021-02-23 | 2252800 | 5.1% |
| 2021-01-18 | 2244608 | 5.1% |
| 2021-03-31 | 2114796 | 4.8% |
| 2021-02-28 | 2093056 | 4.7% |
| 2021-02-07 | 2079976 | 4.7% |
| 2020-12-05 | 1990891 | 4.5% |
| 2020-12-08 | 1947341 | 4.4% |
| Other values (27) | 21578785 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 117956760 | |
| 0 | 111941456 | |
| - | 88424450 | |
| 1 | 70259068 | |
| 3 | 23069134 | 5.2% |
| 7 | 6765481 | 1.5% |
| 8 | 6597175 | 1.5% |
| 5 | 6421781 | 1.5% |
| 6 | 4325294 | 1.0% |
| 9 | 3696880 | 0.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 353697800 | |
| Dash Punctuation | 88424450 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 117956760 | |
| 0 | 111941456 | |
| 1 | 70259068 | |
| 3 | 23069134 | 6.5% |
| 7 | 6765481 | 1.9% |
| 8 | 6597175 | 1.9% |
| 5 | 6421781 | 1.8% |
| 6 | 4325294 | 1.2% |
| 9 | 3696880 | 1.0% |
| 4 | 2664771 | 0.8% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 88424450 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 442122250 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 117956760 | |
| 0 | 111941456 | |
| - | 88424450 | |
| 1 | 70259068 | |
| 3 | 23069134 | 5.2% |
| 7 | 6765481 | 1.5% |
| 8 | 6597175 | 1.5% |
| 5 | 6421781 | 1.5% |
| 6 | 4325294 | 1.0% |
| 9 | 3696880 | 0.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 442122250 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 117956760 | |
| 0 | 111941456 | |
| - | 88424450 | |
| 1 | 70259068 | |
| 3 | 23069134 | 5.2% |
| 7 | 6765481 | 1.5% |
| 8 | 6597175 | 1.5% |
| 5 | 6421781 | 1.5% |
| 6 | 4325294 | 1.0% |
| 9 | 3696880 | 0.8% |
| Distinct | 623 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 21808.07437 |
| Minimum | 110 |
|---|---|
| Maximum | 69101 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 337.3 MiB |
Quantile statistics
| Minimum | 110 |
|---|---|
| 5-th percentile | 321 |
| Q1 | 10412 |
| median | 11911 |
| Q3 | 45010 |
| 95-th percentile | 60128 |
| Maximum | 69101 |
| Range | 68991 |
| Interquartile range (IQR) | 34598 |
Descriptive statistics
| Standard deviation | 19169.41152 |
|---|---|
| Coefficient of variation (CV) | 0.8790052344 |
| Kurtosis | -0.5259138488 |
| Mean | 21808.07437 |
| Median Absolute Deviation (MAD) | 1746 |
| Skewness | 0.9912693172 |
| Sum | 9.641834908 × 1011 |
| Variance | 367466338.1 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 10134 | 142494 | 0.3% |
| 10130 | 140019 | 0.3% |
| 10126 | 140019 | 0.3% |
| 10127 | 140019 | 0.3% |
| 10120 | 140019 | 0.3% |
| 10136 | 140019 | 0.3% |
| 10137 | 140019 | 0.3% |
| 10162 | 138382 | 0.3% |
| 10160 | 127290 | 0.3% |
| 10171 | 127290 | 0.3% |
| Other values (613) | 42836655 |
| Value | Count | Frequency (%) |
| 110 | 50916 | |
| 120 | 63645 | |
| 121 | 57296 | |
| 129 | 50916 | |
| 131 | 50916 | |
| 133 | 38187 | |
| 134 | 38187 | |
| 141 | 38187 | |
| 145 | 12729 | < 0.1% |
| 160 | 50916 |
| Value | Count | Frequency (%) |
| 69101 | 109627 | |
| 68101 | 101832 | |
| 66101 | 114561 | |
| 65201 | 101832 | |
| 65200 | 114561 | |
| 65190 | 101832 | |
| 65144 | 101832 | |
| 65143 | 101832 | |
| 65142 | 111242 | |
| 65140 | 89103 |
SKUID
Real number (ℝ≥0)
| Distinct | 12729 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 102744136.2 |
| Minimum | 3371671 |
|---|---|
| Maximum | 124400049 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 337.3 MiB |
Quantile statistics
| Minimum | 3371671 |
|---|---|
| 5-th percentile | 3795994 |
| Q1 | 107403584 |
| median | 114806206 |
| Q3 | 117251763 |
| 95-th percentile | 122175059 |
| Maximum | 124400049 |
| Range | 121028378 |
| Interquartile range (IQR) | 9848179 |
Descriptive statistics
| Standard deviation | 33836561.73 |
|---|---|
| Coefficient of variation (CV) | 0.3293283977 |
| Kurtosis | 4.547442146 |
| Mean | 102744136.2 |
| Median Absolute Deviation (MAD) | 2594805 |
| Skewness | -2.521182011 |
| Sum | 4.542546865 × 1015 |
| Variance | 1.14491291 × 1015 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 113775042 | 3486 | < 0.1% |
| 114751105 | 3486 | < 0.1% |
| 106700050 | 3484 | < 0.1% |
| 107379376 | 3484 | < 0.1% |
| 122875593 | 3484 | < 0.1% |
| 117403197 | 3484 | < 0.1% |
| 3674979 | 3484 | < 0.1% |
| 114750169 | 3483 | < 0.1% |
| 104456921 | 3483 | < 0.1% |
| 117250330 | 3483 | < 0.1% |
| Other values (12719) | 44177384 |
| Value | Count | Frequency (%) |
| 3371671 | 3474 | |
| 3371672 | 3472 | |
| 3371673 | 3471 | |
| 3371677 | 3472 | |
| 3371678 | 3469 | |
| 3371679 | 3477 | |
| 3407894 | 3474 | |
| 3407895 | 3470 | |
| 3407897 | 3475 | |
| 3407898 | 3471 |
| Value | Count | Frequency (%) |
| 124400049 | 3473 | |
| 124400047 | 3470 | |
| 124400045 | 3474 | |
| 124400044 | 3471 | |
| 124400043 | 3473 | |
| 123950059 | 3476 | |
| 123950058 | 3472 | |
| 123950057 | 3471 | |
| 123950052 | 3481 | |
| 123950048 | 3474 |
| Distinct | 1310 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 337.3 MiB |
| prod8610600 | 326237 |
|---|---|
| prod8551592 | 281751 |
| prod2020012 | 251865 |
| prod8551591 | 248348 |
| prod8351133 | 245167 |
| Other values (1305) |
Length
| Max length | 12 |
|---|---|
| Median length | 11 |
| Mean length | 11.12147303 |
| Min length | 10 |
Characters and Unicode
| Total characters | 491705068 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | prod9280136 |
|---|---|
| 2nd row | prod3470051 |
| 3rd row | prod9390007 |
| 4th row | prod9890061 |
| 5th row | prod9280332 |
Common Values
| Value | Count | Frequency (%) |
| prod8610600 | 326237 | 0.7% |
| prod8551592 | 281751 | 0.6% |
| prod2020012 | 251865 | 0.6% |
| prod8551591 | 248348 | 0.6% |
| prod8351133 | 245167 | 0.6% |
| prod8360162 | 237883 | 0.5% |
| prod8610597 | 236552 | 0.5% |
| prod2810229 | 209786 | 0.5% |
| prod9710083 | 208408 | 0.5% |
| prod8780491 | 198649 | 0.4% |
| Other values (1300) | 41767579 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| prod8610600 | 326237 | 0.7% |
| prod8551592 | 281751 | 0.6% |
| prod2020012 | 251865 | 0.6% |
| prod8551591 | 248348 | 0.6% |
| prod8351133 | 245167 | 0.6% |
| prod8360162 | 237883 | 0.5% |
| prod8610597 | 236552 | 0.5% |
| prod2810229 | 209786 | 0.5% |
| prod9710083 | 208408 | 0.5% |
| prod8780491 | 198649 | 0.4% |
| Other values (1300) | 41767579 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 90309716 | |
| p | 44212225 | |
| r | 44212225 | |
| o | 44212225 | |
| d | 44212225 | |
| 9 | 39086138 | |
| 1 | 32061467 | 6.5% |
| 8 | 29367077 | 6.0% |
| 2 | 27019427 | 5.5% |
| 6 | 20568350 | 4.2% |
| Other values (4) | 76443993 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 314856168 | |
| Lowercase Letter | 176848900 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 90309716 | |
| 9 | 39086138 | |
| 1 | 32061467 | 10.2% |
| 8 | 29367077 | 9.3% |
| 2 | 27019427 | 8.6% |
| 6 | 20568350 | 6.5% |
| 3 | 20193665 | 6.4% |
| 5 | 19867698 | 6.3% |
| 7 | 18476341 | 5.9% |
| 4 | 17906289 | 5.7% |
Lowercase Letter
| Value | Count | Frequency (%) |
| p | 44212225 | |
| r | 44212225 | |
| o | 44212225 | |
| d | 44212225 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 314856168 | |
| Latin | 176848900 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 90309716 | |
| 9 | 39086138 | |
| 1 | 32061467 | 10.2% |
| 8 | 29367077 | 9.3% |
| 2 | 27019427 | 8.6% |
| 6 | 20568350 | 6.5% |
| 3 | 20193665 | 6.4% |
| 5 | 19867698 | 6.3% |
| 7 | 18476341 | 5.9% |
| 4 | 17906289 | 5.7% |
Latin
| Value | Count | Frequency (%) |
| p | 44212225 | |
| r | 44212225 | |
| o | 44212225 | |
| d | 44212225 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 491705068 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 90309716 | |
| p | 44212225 | |
| r | 44212225 | |
| o | 44212225 | |
| d | 44212225 | |
| 9 | 39086138 | |
| 1 | 32061467 | 6.5% |
| 8 | 29367077 | 6.0% |
| 2 | 27019427 | 5.5% |
| 6 | 20568350 | 4.2% |
| Other values (4) | 76443993 |
| Distinct | 275 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.2131558409 |
| Minimum | 0 |
|---|---|
| Maximum | 1822 |
| Zeros | 41347982 |
| Zeros (%) | 93.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 337.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 1822 |
| Range | 1822 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.639859138 |
|---|---|
| Coefficient of variation (CV) | 7.693240452 |
| Kurtosis | 192153.5216 |
| Mean | 0.2131558409 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 219.0416833 |
| Sum | 9424094 |
| Variance | 2.689137993 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 41347982 | |
| 1 | 908743 | 2.1% |
| 2 | 668073 | 1.5% |
| 3 | 457784 | 1.0% |
| 4 | 305099 | 0.7% |
| 5 | 176737 | 0.4% |
| 6 | 103881 | 0.2% |
| 7 | 63167 | 0.1% |
| 8 | 42679 | 0.1% |
| 9 | 28691 | 0.1% |
| Other values (265) | 109389 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 41347982 | |
| 1 | 908743 | 2.1% |
| 2 | 668073 | 1.5% |
| 3 | 457784 | 1.0% |
| 4 | 305099 | 0.7% |
| 5 | 176737 | 0.4% |
| 6 | 103881 | 0.2% |
| 7 | 63167 | 0.1% |
| 8 | 42679 | 0.1% |
| 9 | 28691 | 0.1% |
| Value | Count | Frequency (%) |
| 1822 | 1 | |
| 1687 | 1 | |
| 1684 | 2 | |
| 1682 | 1 | |
| 1675 | 2 | |
| 898 | 1 | |
| 667 | 1 | |
| 631 | 1 | |
| 549 | 1 | |
| 473 | 2 |
| Distinct | 38 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.004348706721 |
| Minimum | 0 |
|---|---|
| Maximum | 182 |
| Zeros | 44050686 |
| Zeros (%) | 99.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 337.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 182 |
| Range | 182 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.09123389259 |
|---|---|
| Coefficient of variation (CV) | 20.97954598 |
| Kurtosis | 389108.1135 |
| Mean | 0.004348706721 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 256.1790504 |
| Sum | 192266 |
| Variance | 0.008323623158 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=38)
| Value | Count | Frequency (%) |
| 0 | 44050686 | |
| 1 | 142307 | 0.3% |
| 2 | 13999 | < 0.1% |
| 3 | 2975 | < 0.1% |
| 4 | 1088 | < 0.1% |
| 5 | 504 | < 0.1% |
| 6 | 250 | < 0.1% |
| 7 | 131 | < 0.1% |
| 8 | 88 | < 0.1% |
| 9 | 51 | < 0.1% |
| Other values (28) | 146 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 44050686 | |
| 1 | 142307 | 0.3% |
| 2 | 13999 | < 0.1% |
| 3 | 2975 | < 0.1% |
| 4 | 1088 | < 0.1% |
| 5 | 504 | < 0.1% |
| 6 | 250 | < 0.1% |
| 7 | 131 | < 0.1% |
| 8 | 88 | < 0.1% |
| 9 | 51 | < 0.1% |
| Value | Count | Frequency (%) |
| 182 | 1 | |
| 72 | 1 | |
| 66 | 1 | |
| 64 | 1 | |
| 44 | 2 | |
| 40 | 2 | |
| 38 | 1 | |
| 37 | 1 | |
| 36 | 1 | |
| 33 | 1 |
| Distinct | 88 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.004878424463 |
| Minimum | 0 |
|---|---|
| Maximum | 406 |
| Zeros | 44097689 |
| Zeros (%) | 99.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 337.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 406 |
| Range | 406 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.2057122327 |
|---|---|
| Coefficient of variation (CV) | 42.16776016 |
| Kurtosis | 1100510.526 |
| Mean | 0.004878424463 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 694.084977 |
| Sum | 215686 |
| Variance | 0.04231752269 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 44097689 | |
| 1 | 77220 | 0.2% |
| 2 | 18520 | < 0.1% |
| 3 | 7951 | < 0.1% |
| 4 | 4152 | < 0.1% |
| 5 | 1971 | < 0.1% |
| 6 | 1317 | < 0.1% |
| 7 | 801 | < 0.1% |
| 8 | 588 | < 0.1% |
| 9 | 394 | < 0.1% |
| Other values (78) | 1622 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 44097689 | |
| 1 | 77220 | 0.2% |
| 2 | 18520 | < 0.1% |
| 3 | 7951 | < 0.1% |
| 4 | 4152 | < 0.1% |
| 5 | 1971 | < 0.1% |
| 6 | 1317 | < 0.1% |
| 7 | 801 | < 0.1% |
| 8 | 588 | < 0.1% |
| 9 | 394 | < 0.1% |
| Value | Count | Frequency (%) |
| 406 | 1 | |
| 394 | 1 | |
| 392 | 1 | |
| 263 | 1 | |
| 198 | 1 | |
| 196 | 1 | |
| 194 | 1 | |
| 172 | 1 | |
| 133 | 1 | |
| 128 | 1 |
| Distinct | 67 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 73.18668237 |
| Minimum | 7 |
|---|---|
| Maximum | 299 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 337.3 MiB |
Quantile statistics
| Minimum | 7 |
|---|---|
| 5-th percentile | 19 |
| Q1 | 49 |
| median | 68 |
| Q3 | 98 |
| 95-th percentile | 128 |
| Maximum | 299 |
| Range | 292 |
| Interquartile range (IQR) | 49 |
Descriptive statistics
| Standard deviation | 35.73652885 |
|---|---|
| Coefficient of variation (CV) | 0.4882927836 |
| Kurtosis | 2.177769492 |
| Mean | 73.18668237 |
| Median Absolute Deviation (MAD) | 20 |
| Skewness | 0.9355204537 |
| Sum | 3235746068 |
| Variance | 1277.099495 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 128 | 3772361 | 8.5% |
| 58 | 3589811 | 8.1% |
| 68 | 3137423 | 7.1% |
| 98 | 2844978 | 6.4% |
| 49 | 2769499 | 6.3% |
| 69 | 2725760 | 6.2% |
| 59 | 2725515 | 6.2% |
| 39 | 2500704 | 5.7% |
| 88 | 2030531 | 4.6% |
| 118 | 1967877 | 4.5% |
| Other values (57) | 16147766 |
| Value | Count | Frequency (%) |
| 7 | 4797 | < 0.1% |
| 8 | 166713 | 0.4% |
| 9 | 496643 | |
| 10 | 3475 | < 0.1% |
| 11 | 3474 | < 0.1% |
| 12 | 138918 | 0.3% |
| 14 | 332198 | |
| 15 | 6945 | < 0.1% |
| 16 | 24480 | 0.1% |
| 18 | 283718 |
| Value | Count | Frequency (%) |
| 299 | 3480 | < 0.1% |
| 298 | 7280 | < 0.1% |
| 248 | 149135 | |
| 244 | 911 | < 0.1% |
| 238 | 3473 | < 0.1% |
| 228 | 83852 | |
| 199 | 1821 | < 0.1% |
| 198 | 115488 | |
| 179 | 11148 | < 0.1% |
| 178 | 29252 | 0.1% |
| Distinct | 49 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 90.08619421 |
| Minimum | 8 |
|---|---|
| Maximum | 598 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 337.3 MiB |
Quantile statistics
| Minimum | 8 |
|---|---|
| 5-th percentile | 38 |
| Q1 | 58 |
| median | 88 |
| Q3 | 118 |
| 95-th percentile | 148 |
| Maximum | 598 |
| Range | 590 |
| Interquartile range (IQR) | 60 |
Descriptive statistics
| Standard deviation | 38.92280054 |
|---|---|
| Coefficient of variation (CV) | 0.4320617702 |
| Kurtosis | 4.577780375 |
| Mean | 90.08619421 |
| Median Absolute Deviation (MAD) | 30 |
| Skewness | 0.9698655916 |
| Sum | 3982911088 |
| Variance | 1514.984402 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=49)
| Value | Count | Frequency (%) |
| 128 | 6546467 | |
| 58 | 5514415 | |
| 98 | 5464857 | |
| 68 | 5042626 | |
| 88 | 4021604 | |
| 118 | 3561788 | |
| 108 | 2187571 | 4.9% |
| 48 | 2008986 | 4.5% |
| 78 | 1967571 | 4.5% |
| 148 | 1141788 | 2.6% |
| Other values (39) | 6754552 |
| Value | Count | Frequency (%) |
| 8 | 166713 | 0.4% |
| 10 | 3475 | < 0.1% |
| 11 | 3474 | < 0.1% |
| 12 | 138918 | 0.3% |
| 14 | 298771 | |
| 15 | 6945 | < 0.1% |
| 16 | 38209 | 0.1% |
| 18 | 607840 | |
| 20 | 86836 | 0.2% |
| 22 | 72923 | 0.2% |
| Value | Count | Frequency (%) |
| 598 | 3480 | < 0.1% |
| 398 | 3471 | < 0.1% |
| 348 | 3472 | < 0.1% |
| 298 | 52095 | 0.1% |
| 268 | 3473 | < 0.1% |
| 248 | 218842 | |
| 238 | 3473 | < 0.1% |
| 228 | 163280 | 0.4% |
| 198 | 461916 | |
| 178 | 152795 | 0.3% |
| Distinct | 34 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 25.56164179 |
| Minimum | 0 |
|---|---|
| Maximum | 299 |
| Zeros | 24122772 |
| Zeros (%) | 54.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 337.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 49 |
| 95-th percentile | 89 |
| Maximum | 299 |
| Range | 299 |
| Interquartile range (IQR) | 49 |
Descriptive statistics
| Standard deviation | 32.66404021 |
|---|---|
| Coefficient of variation (CV) | 1.277853765 |
| Kurtosis | 0.4258587312 |
| Mean | 25.56164179 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.01469604 |
| Sum | 1130137058 |
| Variance | 1066.939523 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=34)
| Value | Count | Frequency (%) |
| 0 | 24122772 | |
| 49 | 2769499 | 6.3% |
| 69 | 2725760 | 6.2% |
| 59 | 2725515 | 6.2% |
| 39 | 2500704 | 5.7% |
| 79 | 1788788 | 4.0% |
| 29 | 1600686 | 3.6% |
| 89 | 1366068 | 3.1% |
| 19 | 1060321 | 2.4% |
| 99 | 762788 | 1.7% |
| Other values (24) | 2789324 | 6.3% |
| Value | Count | Frequency (%) |
| 0 | 24122772 | |
| 7 | 4797 | < 0.1% |
| 9 | 496643 | 1.1% |
| 14 | 159628 | 0.4% |
| 19 | 1060321 | 2.4% |
| 24 | 161279 | 0.4% |
| 29 | 1600686 | 3.6% |
| 34 | 422595 | 1.0% |
| 39 | 2500704 | 5.7% |
| 44 | 374443 | 0.8% |
| Value | Count | Frequency (%) |
| 299 | 3480 | < 0.1% |
| 244 | 911 | < 0.1% |
| 199 | 1821 | < 0.1% |
| 179 | 11148 | < 0.1% |
| 169 | 27586 | 0.1% |
| 159 | 21906 | < 0.1% |
| 149 | 12062 | < 0.1% |
| 139 | 121512 | |
| 134 | 45112 | 0.1% |
| 129 | 3496 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| DATE | LOCATIONID | SKUID | PRODUCTID | AVAILABLEQUANTITY | SALEPRICE | UNITSRESTOCKED | CURRENTPRICE | LISTPRICE | UNITSSOLD | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020-11-07 | 268 | 113800939 | prod9280136 | 0 | 0.0 | 0 | 99 | 198 | 99.0 |
| 1 | 2020-11-07 | 268 | 3777889 | prod3470051 | 0 | 0.0 | 0 | 128 | 128 | 0.0 |
| 2 | 2020-11-07 | 268 | 107376607 | prod9390007 | 0 | 0.0 | 0 | 128 | 128 | 0.0 |
| 3 | 2020-11-07 | 268 | 117377502 | prod9890061 | 0 | 0.0 | 0 | 99 | 128 | 99.0 |
| 4 | 2020-11-07 | 268 | 113775041 | prod9280332 | 0 | 0.0 | 0 | 19 | 28 | 19.0 |
| 5 | 2020-11-07 | 268 | 101777165 | prod8430902 | 0 | 0.0 | 0 | 98 | 98 | 0.0 |
| 6 | 2020-11-07 | 268 | 115650177 | prod9370109 | 0 | 0.0 | 0 | 98 | 98 | 0.0 |
| 7 | 2020-11-07 | 268 | 122825362 | prod9710083 | 0 | 0.0 | 0 | 58 | 58 | 0.0 |
| 8 | 2020-11-07 | 268 | 122851265 | prod10030207 | 0 | 0.0 | 0 | 68 | 68 | 0.0 |
| 9 | 2020-11-07 | 268 | 117400738 | prod8780491 | 0 | 0.0 | 0 | 88 | 88 | 0.0 |
Last rows
| DATE | LOCATIONID | SKUID | PRODUCTID | AVAILABLEQUANTITY | SALEPRICE | UNITSRESTOCKED | CURRENTPRICE | LISTPRICE | UNITSSOLD | |
|---|---|---|---|---|---|---|---|---|---|---|
| 44212215 | 2020-11-05 | 212 | 117376453 | prod8555454 | 0 | 0.0 | 0 | 58 | 58 | 0.0 |
| 44212216 | 2020-11-05 | 212 | 117401608 | prod10120018 | 0 | 0.0 | 0 | 64 | 88 | 64.0 |
| 44212217 | 2020-11-05 | 212 | 3764063 | prod8300206 | 0 | 0.0 | 0 | 128 | 128 | 0.0 |
| 44212218 | 2020-11-05 | 212 | 118925434 | prod9960764 | 0 | 0.0 | 0 | 168 | 168 | 0.0 |
| 44212219 | 2020-11-05 | 212 | 3821704 | prod8351133 | 0 | 0.0 | 0 | 118 | 118 | 0.0 |
| 44212220 | 2020-11-05 | 212 | 104457186 | prod8470023 | 0 | 0.0 | 0 | 118 | 118 | 0.0 |
| 44212221 | 2020-11-05 | 212 | 3705917 | prod8351063 | 0 | 0.0 | 0 | 44 | 44 | 0.0 |
| 44212222 | 2020-11-05 | 212 | 112900039 | prod9200027 | 0 | 0.0 | 0 | 98 | 98 | 0.0 |
| 44212223 | 2020-11-05 | 212 | 117226881 | prod10000024 | 0 | 0.0 | 0 | 49 | 88 | 49.0 |
| 44212224 | 2020-11-05 | 212 | 107383031 | prod2090108 | 0 | 0.0 | 0 | 68 | 68 | 0.0 |